Combination of Machine Learning Methods for Optimum Chinese Word Segmentation

نویسندگان

Masayuki Asahara

Kenta Fukuoka

Ai Azuma

Chooi-Ling Goh

Yotaro Watanabe

Yuji Matsumoto

Takashi Tsuzuki

چکیده

This article presents our recent work for participation in the Second International Chinese Word Segmentation Bakeoff. Our system performs two procedures: Out-ofvocabulary extraction and word segmentation. We compose three out-of-vocabulary extraction modules: Character-based tagging with different classifiers – maximum entropy, support vector machines, and conditional random fields. We also compose three word segmentation modules – character-based tagging by maximum entropy classifier, maximum entropy markov model, and conditional random fields. All modules are based on previously proposed methods. We submitted three systems which are different combination of the modules.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Simple and Effective Closed Test for Chinese Word Segmentation Based on Sequence Labeling

In many Chinese text processing tasks, Chinese word segmentation is a vital and required step. Various methods have been proposed to address this problem using machine learning algorithm in previous studies. In order to achieve high performance, many studies used external resources and combined with various machine learning algorithms to help segmentation. The goal of this paper is to construct...

متن کامل

Experimental Comparison of Discriminative Learning Approaches for Chinese Word Segmentation

Natural language processing tasks assume that the input is tokenized into individual words. In languages like Chinese, however, such tokens are not available in the written form. This thesis explores the use of machine learning to segment Chinese sentences into word tokens. We conduct a detailed experimental comparison between various methods for word segmentation. We have built two Chinese wor...

متن کامل

Text Window Denoising Autoencoder: Building Deep Architecture for Chinese Word Segmentation

Deep learning is the new frontier of machine learning research, which has led to many recent breakthroughs in English natural language processing. However, there are inherent differences between Chinese and English, and little work has been done to apply deep learning techniques to Chinese natural language processing. In this paper, we propose a deep neural network model: text window denoising ...

متن کامل

A domain adaption Word Segmenter For Sighan Backoff 2010

We present a Chinese word segmentation system which ran on the closed track of the simplified Chinese Word Segmentation task of CIPS-SIGHAN-CLP 2010 bakeoffs. Our segmenter was built using a HMM. To fulfill the cross-domain segmentation task, we use semi-supervised machine learning method to get the HMM model. Finally we get the mean result of four domains: P=0.719, R=0.72

متن کامل

A Long Dependency Aware Deep Architecture for Joint Chinese Word Segmentation and POS Tagging

Long-term context is crucial to joint Chinese word segmentation and POS tagging (S&T) task. However, most of machine learning based methods extract features from a window of characters. Due to the limitation of window size, these methods can not exploit the long distance information. In this work, we propose a long dependency aware deep architecture for joint S&T task. Specifically, to simulate...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2005

Combination of Machine Learning Methods for Optimum Chinese Word Segmentation

نویسندگان

چکیده

منابع مشابه

A Simple and Effective Closed Test for Chinese Word Segmentation Based on Sequence Labeling

Experimental Comparison of Discriminative Learning Approaches for Chinese Word Segmentation

Text Window Denoising Autoencoder: Building Deep Architecture for Chinese Word Segmentation

A domain adaption Word Segmenter For Sighan Backoff 2010

A Long Dependency Aware Deep Architecture for Joint Chinese Word Segmentation and POS Tagging

عنوان ژورنال:

اشتراک گذاری